# Cosine Similarity

## Application Context

TV Shows
- 0: Game of Thrones
- 1: Peaky Blinders
- 2: Hannah Montana
- 3: The IT Crowded

Users
- 0: Antar
- 1: Tameem
- 2: Logain
- 3: Jumana

Watching Record
- Antar saw 30 episodes of Game of Thrones, and 35 episodes of Peaky Blinders.
- Tameem saw 5 episodes of Game of Thrones, and 30 episodes of The IT Crowded.
- Logain saw 32 episodes of Hannah Montana, 20 episodes of The IT Crowded.
- Jumana saw 34 episodes of Hannah Montana, 5 episodes of The IT Crowded

Matrix (Rows: users, Columns: Movies)
\begin{bmatrix}
30 & 35 & 0 & 0 \\
5 & 0 & 0 & 30 \\
0 & 0 & 32 & 20 \\
0 & 0 & 34 & 5
\end{bmatrix}

Intuitively, Logain and Jumana share a similar taste, while Logain and Antar significantly differ.

## Geometric Interpretation of Matrix Rows as Vectors

- In Geogebra, Visualize 2d vectors of users in the context of only two TV-shows. Interpret the angle between them as measuring users taste similarity.
- Observe the undesired behaviour of same angle regardless of vectors lengths.

## Dot Product

- Geometric Intuition. For angle $\theta$ between vectors $u$ and $v$, Think of $|u| \cos \theta = u_v$ as the projection of $u$ on $v$. Intuitively, The dot-product multiplies the two magnititudes but the greater the angle the smaller multiplication is. Observe extreme cases $\theta = 0^\circ$ and $\theta = 90^\circ$.
- Learn more from [Paul's Notes, Calulus II, Vectors](https://tutorial.math.lamar.edu/Classes/CalcII/DotProduct.aspx) or [Openstax, Vectors in Space](https://math.libretexts.org/Bookshelves/Calculus/Calculus_(OpenStax)/12%3A_Vectors_in_Space/12.03%3A_The_Dot_Product).

## Computation in Python

In [None]:
import numpy as np
from numpy.linalg import norm

In [None]:
# user x TV-shows matrix
M = np.array([
    [30, 35, 0, 0],
    [5, 0, 0, 30],
    [0, 0, 32, 20],
    [0, 0, 34, 5]
])

In [None]:
# input. vector A
#        vector B
# output. cosine similarity of A and B
def cosineSimilarity(A, B):
  return np.dot(A, B) / ( norm(A) * norm(B) )

In [None]:
# Similarity between Logain and Antar
cosineSimilarity(M[2], M[0])

0.0

In [None]:
# Similarity between Logain and Tameem
cosineSimilarity(M[2], M[1])

0.5227877340566137

In [None]:
# Similarity between Logain and Jumana
cosineSimilarity(M[2], M[3])

0.9160865295804294

## Tasks

- Read section 7.2.3 Cosine Similarity in Falk.
- We modeled similarity among users. What kind of matrix/data you need to interpret similarity among TV-shows? Feel free to explore datasets for inspiration.
- Given the flaw of same angle regardless of lengths, Design a solution to overcome it. Remark the problem get significant when the difference between total watched episodes among two users is large.
- Design a recommendation engine, that recommends movies, based on similarity among users.